AITopics

Country:

Europe (0.28)
North America > United States > California > Orange County > Irvine (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Neural Information Processing SystemsMar-26-2025, 05:41:00 GMT

Benchmark Data Repositories for Better Benchmarking Rachel Longjohn 1

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for--and levies criticisms at--data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these benchmark data repositories and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark data repositories, with a focus on improving benchmarking practices in machine learning.

artificial intelligence, machine learning, natural language, (15 more...)

Country:

Europe (0.67)
North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.46)

Li, Zhaoqin "Richie", Chen, Sicheng

Design and Implementation of an FPGA-Based Tiled Matrix Multiplication Accelerator for Transformer Self-Attention on the Xilinx KV260 SoM

arXiv.org Artificial IntelligenceMar-20-2025

Transformer-based LLMs spend most of their compute in large matrix multiplications for attention and feed-forward layers. Recognizing that the Q, K, and V linear projections within the Multi-Head Self-Attention (MHA) module represent a critical computational bottleneck, we strategically focused our efforts on accelerating these operations. We present a tiled matrix multiplication accelerator optimized for such workloads on a Xilinx KV260 on-board FPGA. Key innovations include persistent on-chip storage for one matrix operand, two-level tiling for data reuse, and a systolic-like unrolled compute engine. Implemented via high-level synthesis (HLS) and integrated with DistilBERT for Q, K, V projections, our accelerator achieves significant speedup and energy efficiency gains over CPU baselines. Standalone GEMM benchmarks show up to a 7x speedup over an ARM CPU (PyTorch) and ~200x over naive numpy, with a throughput of up to 3.1 GFLOPs on 768x3072 matrices. Although the overall end-to-end DistilBERT acceleration is more modest, our results validate the potential of FPGA-based acceleration for critical components of Transformer models.

large language model, machine learning, natural language, (18 more...)

2503.16731

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Semiconductors & Electronics (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Vendrell-Gallart, Oriol, Negarandeh, Nima, Foumani, Zahra Zanjani, Amiri, Mahsa, Valdevit, Lorenzo, Bostanabad, Ramin

A preliminary data fusion study to assess the feasibility of Foundation Process-Property Models in Laser Powder Bed Fusion

arXiv.org Artificial IntelligenceMar-20-2025

Foundation models are at the forefront of an increasing number of critical applications. In regards to technologies such as additive manufacturing (AM), these models have the potential to dramatically accelerate process optimization and, in turn, design of next generation materials. A major challenge that impedes the construction of foundation process-property models is data scarcity. To understand the impact of this challenge, and since foundation models rely on data fusion, in this work we conduct controlled experiments where we focus on the transferability of information across different material systems and properties. More specifically, we generate experimental datasets from 17-4 PH and 316L stainless steels (SSs) in Laser Powder Bed Fusion (LPBF) where we measure the effect of five process parameters on porosity and hardness. We then leverage Gaussian processes (GPs) for process-property modeling in various configurations to test if knowledge about one material system or property can be leveraged to build more accurate machine learning models for other material systems or properties. Through extensive cross-validation studies and probing the GPs' interpretable hyperparameters, we study the intricate relation among data size and dimensionality, complexity of the process-property relations, noise, and characteristics of machine learning models. Our findings highlight the need for structured learning approaches that incorporate domain knowledge in building foundation process-property models rather than relying on uninformed data fusion in data-limited applications.

artificial intelligence, correlation, machine learning, (16 more...)

2503.16667

Country: North America > United States > California > Orange County > Irvine (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.49)
Materials (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.84)

Neural Information Processing SystemsMar-18-2025, 07:40:59 GMT

Learning to be Smooth: An End-to-End Differentiable Particle Smoother

For challenging state estimation problems arising in domains like vision and robotics, particle-based representations attractively enable temporal reasoning about multiple posterior modes. Particle smoothers offer the potential for more accurate offline data analysis by propagating information both forward and backward in time, but have classically required human-engineered dynamics and observation models. Extending recent advances in discriminative training of particle filters, we develop a framework for low-variance propagation of gradients across long time sequences when training particle smoothers. Our "two-filter" smoother integrates particle streams that are propagated forward and backward in time, while incorporating stratification and importance weights in the resampling step to provide low-variance gradient estimates for neural network dynamics and observation models. The resulting mixture density particle smoother is substantially more accurate than state-of-the-art particle filters, as well as search-based baselines, for city-scale global vehicle localization from real-world videos and maps.

dataset, measurement model, particle, (16 more...)

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry:

Information Technology (0.47)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Neural Information Processing SystemsMar-18-2025, 04:51:09 GMT

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques.

artificial intelligence, decision tree learning, machine learning, (18 more...)

Country: North America > United States > California > Orange County > Irvine (0.24)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Negarandeh, Nima, Mora, Carlos, Bostanabad, Ramin

SEEK: Self-adaptive Explainable Kernel For Nonstationary Gaussian Processes

arXiv.org Artificial IntelligenceMar-18-2025

Gaussian processes (GPs) are powerful probabilistic models that define flexible priors over functions, offering strong interpretability and uncertainty quantification. However, GP models often rely on simple, stationary kernels which can lead to suboptimal predictions and miscalibrated uncertainty estimates, especially in nonstationary real-world applications. In this paper, we introduce SEEK, a novel class of learnable kernels to model complex, nonstationary functions via GPs. Inspired by artificial neurons, SEEK is derived from first principles to ensure symmetry and positive semi-definiteness, key properties of valid kernels. The proposed method achieves flexible and adaptive nonstationarity by learning a mapping from a set of base kernels. Compared to existing techniques, our approach is more interpretable and much less prone to overfitting. We conduct comprehensive sensitivity analyses and comparative studies to demonstrate that our approach is not robust to only many of its design choices, but also outperforms existing stationary/nonstationary kernels in both mean prediction accuracy and uncertainty quantification.

base kernel, kernel, nonstationary kernel, (16 more...)

2503.14785

Country: North America > United States > California > Orange County > Irvine (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Yang, Zhuoyi, Harris, Ian G.

LogLLaMA: Transformer-based log anomaly detection with LLaMA

arXiv.org Artificial IntelligenceMar-18-2025

Log anomaly detection refers to the task that distinguishes the anomalous log messages from normal log messages. Transformer-based large language models (LLMs) are becoming popular for log anomaly detection because of their superb ability to understand complex and long language patterns. In this paper, we propose LogLLaMA, a novel framework that leverages LLaMA2. LogLLaMA is first finetuned on normal log messages from three large-scale datasets to learn their patterns. After finetuning, the model is capable of generating successive log messages given previous log messages. Our generative model is further trained to identify anomalous log messages using reinforcement learning (RL). The experimental results show that LogLLaMA outperforms the state-of-the-art approaches for anomaly detection on BGL, Thunderbird, and HDFS datasets.

data mining, large language model, machine learning, (22 more...)

2503.14849

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Seng, Jonas, Busch, Florian Peter, Prasad, Pooja, Dhami, Devendra Singh, Mundt, Martin, Kersting, Kristian

Scaling Probabilistic Circuits via Data Partitioning

arXiv.org Artificial IntelligenceMar-11-2025

Probabilistic circuits (PCs) enable us to learn joint distributions over a set of random variables and to perform various probabilistic queries in a tractable fashion. Though the tractability property allows PCs to scale beyond non-tractable models such as Bayesian Networks, scaling training and inference of PCs to larger, real-world datasets remains challenging. To remedy the situation, we show how PCs can be learned across multiple machines by recursively partitioning a distributed dataset, thereby unveiling a deep connection between PCs and federated learning (FL). This leads to federated circuits (FCs) -- a novel and flexible federated learning (FL) framework that (1) allows one to scale PCs on distributed learning environments (2) train PCs faster and (3) unifies for the first time horizontal, vertical, and hybrid FL in one framework by re-framing FL as a density estimation problem over distributed datasets. We demonstrate FC's capability to scale PCs on various large-scale datasets. Also, we show FC's versatility in handling horizontal, vertical, and hybrid FL within a unified framework on multiple classification tasks.

artificial intelligence, deep learning, machine learning, (18 more...)

2503.08141

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMar-11-2025

TransECG: Leveraging Transformers for Explainable ECG Re-identification Risk Analysis

Wang, Ziyu, Khatibi, Elahe, Kazemi, Kianoosh, Azimi, Iman, Mousavi, Sanaz, Malik, Shaista, Rahmani, Amir M.

Electrocardiogram (ECG) signals are widely shared across multiple clinical applications for diagnosis, health monitoring, and biometric authentication. While valuable for healthcare, they also carry unique biometric identifiers that pose privacy risks, especially when ECG data shared across multiple entities. These risks are amplified in shared environments, where re-identification threats can compromise patient privacy. Existing deep learning re-identification models prioritize accuracy but lack explainability, making it challenging to understand how the unique biometric characteristics encoded within ECG signals are recognized and utilized for identification. Without these insights, despite high accuracy, developing secure and trustable ECG data-sharing frameworks remains difficult, especially in diverse, multi-source environments. In this work, we introduce TransECG, a Vision Transformer (ViT)-based method that uses attention mechanisms to pinpoint critical ECG segments associated with re-identification tasks like gender, age, and participant ID. Our approach demonstrates high accuracy (89.9% for gender, 89.9% for age, and 88.6% for ID re-identification) across four real-world datasets with 87 participants. Importantly, we provide key insights into ECG components such as the R-wave, QRS complex, and P-Q interval in re-identification. For example, in the gender classification, the R wave contributed 58.29% to the model's attention, while in the age classification, the P-R interval contributed 46.29%. By combining high predictive performance with enhanced explainability, TransECG provides a robust solution for privacy-conscious ECG data sharing, supporting the development of secure and trusted healthcare data environment.

classification, data mining, machine learning, (18 more...)

2503.13495

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.89)